Short-Text Similarity Measurement Using Word Sense Disambiguation and Synonym Expansion
نویسندگان
چکیده
Measuring the similarity between text fragments at the sentence level is made difficult by the fact that two sentences that are semantically related may not contain any words in common. This means that standard IR measures of text similarity, which are based on word co-occurrence and designed to operate at the document level, are not appropriate. While various sentence similarity measures have been recently proposed, these measures do not fully utilise the semantic information available from lexical resources such as WordNet. In this paper we propose a new sentence similarity measure which uses word sense disambiguation and synonym expansion to provide a richer semantic context to measure sentence similarity. Evaluation of the measure on three benchmark datasets shows that as a stand-alone sentence similarity measure, the method achieves better results than other methods recently reported in the literature.
منابع مشابه
بررسی نقش انواع بافتار همنویسهها در تعیین شباهت بین مدارک
Aim: Automatic information retrieval is based on the assumption that texts contain content or structural elements that can be used in word sense disambiguation and thereby improving the effectiveness of the results retrieved. Homographs are among the words requiring sense disambiguation. Depending on their roles and positions in texts, homograph contexts could be divided to different types, wit...
متن کاملImprove Lexicon-based Word Embeddings By Word Sense Disambiguation
There have been some works that learn a lexicon together with the corpus to improve the word embeddings. However, they either model the lexicon separately but update the neural networks for both the corpus and the lexicon by the same likelihood, or minimize the distance between all of the synonym pairs in the lexicon. Such methods do not consider the relatedness and difference of the corpus and...
متن کاملPUTOP: Turning Predominant Senses into a Topic Model for Word Sense Disambiguation
We extend on McCarthy et al.’s predominant sense method to create an unsupervised method of word sense disambiguation that uses automatically derived topics using Latent Dirichlet allocation. Using topicspecific synset similarity measures, we create predictions for each word in each document using only word frequency information. It is hoped that this procedure can improve upon the method for l...
متن کاملMerging Word Senses
WordNet, a widely used sense inventory for Word Sense Disambiguation(WSD), is often too fine-grained for many Natural Language applications because of its narrow sense distinctions. We present a semi-supervised approach to learn similarity between WordNet synsets using a graph based recursive similarity definition. We seed our framework with sense similarities of all the word-sense pairs, learn...
متن کاملLexical ambiguity and Information Retrieval revisited
A number of previous experiments on the role of lexical ambiguity, in Information Retrieval are reproduced on the'IR-Semcor test collection (derived from Semcor), where both queries and documents are hand-tagged ;with phrases, Part-Of-Speech and WordNet 1.5 senses. Our results indicate that a) Word Sense Disambiguation can be more beneficial to Information Retrieval than the experiments of Sand...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010